Improving List Experiments

 

Gustavo Diaz
McMaster University
gustavodiaz.org

 

Slides: talks.gustavodiaz.org/nu

Questions

  • Have you lied about having COVID symptoms?

  • Would you bribe a police officer to avoid a traffic ticket?

  • Have you been offered goods or favors for your vote?

  • Do you know anyone with ties to a militant organization?

  • Would you oppose a black family moving next door?

  • Would you allow Muslim immigrants to become citizens?

What do these have in common?

  • They are sensitive questions

  • We can only learn about them using surveys

  • But asking about them directly leads to misreporting

  • This form of measurement error is called sensitivity bias

Techniques to reduce sensitivity bias

  • Honesty appeals

  • Confidentiality protocols

  • Randomized response

  • Network scale-up

  • Endorsement experiments

  • List experiments

Techniques to reduce sensitivity bias

  • Honesty appeals

  • Confidentiality protocols

  • Randomized response

  • Network scale-up

  • Endorsement experiments

  • List experiments

Example

List experiment

Here is a list of things that some people have done.

List experiment

Please listen to them and then tell me HOW MANY of them you have done in the past two years.

List experiment

Do not tell me which ones. Just tell me HOW MANY:

 

Control group

  1. Discussed politics with family or friends
  2. Cast a ballot for governor Phil Bryant
  3. Paid dues to a union
  4. Given money to a Tea Party candidate

List experiment

Do not tell me which ones. Just tell me HOW MANY:

 

Treatment group

  1. Discussed politics with family or friends
  2. Cast a ballot for governor Phil Bryant
  3. Paid dues to a union
  4. Given money to a Tea Party candidate

List experiment

Do not tell me which ones. Just tell me HOW MANY:

 

Treatment group

  1. Discussed politics with family or friends
  2. Cast a ballot for governor Phil Bryant
  3. Paid dues to a union
  4. Given money to a Tea Party candidate
  5. Voted “YES” on the Personhood Initiative

Prevalence rate

\[ \text{Proportion(Voted yes)} =\\ \text{Mean(List with sensitive item)} -\\ \text{Mean(List without sensitive item)} \]

library(estimatr)
difference_in_means(count ~ sensitive, data = df)

 

We get a prevalence rate estimate but we do not know how individual respondents voted!

Compare with direct question

Did you vote YES or NO on the Personhood Initiative, which appeared on the November 2011 Mississippi General Election Ballot?

\[ \text{Proportion(Voted yes)} =\\ \text{Mean(Answer yes)} \]

t.test(yes ~ 1, data = df)

Validation

Validation

Validation

Validation

Sensitivity bias reduction not always worth the increased variance

Can we do better?

Double list experiment

List A

  • Californians for Disability (advocating for people with disabilities)
  • California National Organization for Women (advocating for women’s equality and empowerment)
  • American Family Association (advocating for pro-family values)
  • American Red Cross (humanitarian organization)

List B

  • American Legion (veterans service organization)
  • Equality California (gay and lesbian advocacy organization)
  • Tea Party Patriots (conservative group supporting lower taxes and limited government)
  • Salvation Army (charitable organization)

Sensitive item

Organization X (advocating for immigration reduction and measures against undocumented immigration)

  • Everyone sees it

  • Randomly appears in list A or B

  • Equivalent to two parallel list experiments

Three prevalence estimators

\[ \hat{\tau}_A = \text{Mean}(A_t) - \text{Mean}(A_c) \]

difference_in_means(count ~ sensitive, subset = list == "A", data = df)

\[ \hat{\tau}_B = \text{Mean}(B_t) - \text{Mean}(B_c) \]

difference_in_means(count ~ sensitive, subset = list == "B", data = df )

\[ \hat{\tau}_{Pooled} = (\hat{\tau}_A + \hat{\tau}_B)/2 \]

lm_robust(count ~ sensitive + list, clusters = id, data = df)

DLE yields more precise estimates

DLE yields more precise estimates

But variance reduction is not free

But variance reduction is not free

DLE variants

List order Sensitive item location
Fixed Fixed
Randomized Fixed
Fixed Randomized
Randomized Randomized

DLE variants

List order Sensitive item location
Fixed Fixed
Randomized Fixed
Fixed Randomized
Randomized Randomized
  • Fixed-fixed is not an admissible design

DLE variants

List order Sensitive item location
Fixed Fixed
Randomized Fixed
Fixed Randomized
Randomized Randomized
  • Fixed-fixed is not an admissible design
  • Randomized-fixed keeps sensitive item in second list

DLE variants

List order Sensitive item location
Fixed Fixed
Randomized Fixed
Fixed Randomized
Randomized Randomized
  • Fixed-fixed is not an admissible design
  • Randomized-fixed keeps sensitive item in second list

DLE variants

List order Sensitive item location
Fixed Fixed
Randomized Fixed
Fixed Randomized
Randomized Randomized
  • Fixed-fixed is not an admissible design
  • Randomized-fixed keeps sensitive item in second list
  • Fixed-randomized and randomized-randomized shuffle sensitive item order

Carryover design effects

Design effect (Blair and Imai 2012)

The inclusion of a sensitive item affects how survey participants respond to the baseline items within the list.

Carryover design effect

The inclusion of a sensitive item in one list affects how participants respond to the baseline items in the other list.

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Why does this happen?

  • Unique question format
  • Lists usually appear close to each other
  • Positive correlation across lists (Glynn 2013)

Goal: Detect asymmetric shift across treatment schedules

Statistical tests

  1. Difference-in-differences

  2. Signed-rank test

Statistical tests

  1. Difference-in-differences

  2. Signed-rank test

Difference-in-differences test

\[ \hat{\tau}_1 = \text{Mean}(\text{First list}_t) - \text{Mean}(\text{First list}_c) \]

\[ \hat{\tau}_2 = \text{Mean}(\text{Second list}_t) - \text{Mean}(\text{Second list}_c) \]

  • \(H_0: \hat{\tau}_1 - \hat{\tau}_2 = 0\)
  • Deflation: \(\hat{\tau}_1 - \hat{\tau}_2 < 0\)
  • Inflation: \(\hat{\tau}_1 - \hat{\tau}_2 > 0\)
# For fixed-randomized design
lm_robust(count ~ sensitive*first, clusters = id, data = df)

Application to Alvarez et al (2019)

Experiment Statistic p-value
Organization X (advocacy group) 0.079 0.623
Organization Y (border patrol) -0.268 0.082

Application to Alvarez et al (2019)

Experiment Statistic p-value
Organization X (advocacy group) 0.079 0.623
Organization Y (border patrol) -0.268 0.082

Application to Alvarez et al (2019)

Experiment Statistic p-value
Organization X (advocacy group) 0.079 0.623
Organization Y (border patrol) -0.268 0.082

Tests improve our ability to choose baseline items for a DLE

They also help when other things go wrong

Criminal governance tools in Uruguay

  • Facebook sample of Montevideo residents (N = 2688)

  • Four criminal governance strategies

Violent

  • Threaten neighbors
  • Evict neighbors

Non-violent

  • Make donations to neighbors
  • Offer work to neighbors

Criminal governance tools in Uruguay

  • Facebook sample of Montevideo residents (N = 2688)

  • Four criminal governance strategies

Violent

  • Threaten neighbors
  • Evict neighbors

Non-violent

  • Make donations to neighbors
  • Offer work to neighbors

DLE with placebo item

Things people have experienced in the last six months:

List A List B
Saw people doing sports Saw people playing soccer
Visited friends Chatted with friends
Activities by feminist groups Activities by LGBTQ groups
Went to church Went to charity events

DLE with placebo item

Things people have experienced in the last six months:

List A List B
Saw people doing sports Saw people playing soccer
Visited friends Chatted with friends
Activities by feminist groups Activities by LGBTQ groups
Went to church Went to charity events
Gangs threatening neighbors Did not drink mate

DLE with placebo item

Things people have experienced in the last six months:

List A List B
Saw people doing sports Saw people playing soccer
Visited friends Chatted with friends
Activities by feminist groups Activities by LGBTQ groups
Went to church Went to charity events
Did not drink mate Gangs threatening neighbors

Prevalence estimates

Prevalence estimates

Prevalence estimates

What went wrong?

  • Placebo item more frequent than we anticipated

  • Offsets prevalence rates we would have observed

  • Solution: Reconstruct estimate bounds without placebo item

  • Challenge: Respondents may have noticed sensitive item and altered responses in unintended ways

  • Use tests to rule out strategic errors

Applying difference-in-differences test

Sensitive item Statistic p-value
Threaten neighbors 0.12 0.41
Evict neighbors 0.08 0.58
Make donations -0.24 0.16
Offer work -0.11 0.47

Applying difference-in-differences test

Sensitive item Statistic p-value
Threaten neighbors 0.12 0.41
Evict neighbors 0.08 0.58
Make donations -0.24 0.16
Offer work -0.11 0.47

Observed test statistics not unlikely under the null hypothesis of no carryover design effect

So we can move from this

To estimate bounds

To estimate bounds

To estimate bounds

Conclusion

Summary

  • DLEs improve along bias-variance frontier
  • But bring additional questions about validity
  • New tools to address validity
  • Applications: Justify research design and recover estimates

Conclusion

Lessons

  • Seemingly cost-free innovations may have hidden costs
  • Need tools to incorporate into workflow
  • Pilot, pilot pilot!
  • Adaptive rules before confirmatory analyses

Appendix

Alvarez et al (2019) details

Placement
List A List B
Sensitive item
Organization X 545 525
Organization Y 537 543

Mean supported organizations

Control list distributions

Montevideo survey

Montevideo survey

Montevideo survey

Montevideo survey

Stephenson’s signed rank test

\[ \widetilde{T} = \sum_{i=1}^N \text{sgn} \{(z_i - (1-z_i)) (Y_{i1} - Y_{i2})\} \times \tilde{q}_i \]

\[ \tilde{q}_i = {q_i-1 \choose m-1} \text{ for } q_i \geq m \]

\[ \tilde{q}_i = 0 \text{ for } q_i < m \]

\[ \text{with } 1 \leq m \leq N \]

Applied to Alvarez et al (2019)

Organization X
Organization Y
m Statistic p-value Statistic p-value
2 8.356400e+04 1 3.571300e+04 1
5 3.809258e+12 1 3.323093e+12 1
10 1.791638e+23 1 1.825804e+23 1
50 1.439408e+86 1 2.533938e+86 1

DLEs are costly to implement

Test performance

Signed rank performance

Placebo tests

Placebo I
Placebo II
Item Estimate p n Estimate p n
Donate 0.08 0.55 133 -0.03 0.32 635
Evict 0.14 0.46 32 0.00 0.95 628
Threaten 0.24 0.08 102 0.02 0.44 641
Work 0.03 0.88 64 0.01 0.58 647

Network scale-up method

How many X do you know,

Network scale-up method

How many X do you know, who also know you,

Network scale-up method

How many X do you know, who also know you, with whom you have interacted in the last year

Network scale-up method

How many X do you know, who also know you, with whom you have interacted in the last year in person, by phone, or any other channel?

Network scale-up method

How many X do you know…

From Las Piedras
Male 25-29
Police officers
University students
Had a kid last year
Passed away last year
Married last year
Female 45-49

Public employees
Welfare card holders
Registered with party
With kids in public school
Did not vote in last election
Currently in jail
Recently unemployed
SENSITIVE ITEM

NSUM results